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1 A graphical model for protein secondary structure prediction 
Wei Chu, Zoubin Ghahramani, David L Wild 

July 2004 Twenty-first international conference on Machine learning 

Full text available: ^ pdf(366.19 KB) Additional Information: full citation , abstract , references 

In this paper, we present a graphical model for protein secondary structure prediction. This 
model extends segmental semi-Markov models (SSMM) to exploit multiple sequence 
alignment profiles which contain information from evolutionarily related sequences. A novel 
parameterized model is proposed as the likelihood function for the SSMM to capture the 
segmental conformation. By incorporating the information from long range interactions in B- 
sheets, this model is capable of carrying out infere ... 



2 The interaction of knowledge sources in word sense disambiguation 
Mark Stevenson, Yorick Wilks 

September 2001 Computational Linguistics, volume 27 issue 3 
Full text available:. 



pdf(2.16 MB)^ Additional Information: full citation , abstract , references 



Publisher Site 



Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from 
the tradition of combining different knowledge sources in artificial in telligence research. An 
important step in the exploration of this hypothesis is to determine which linguistic 
knowledge sources are most useful and whether their combination leads to improved 
results. We present a sense tagger which uses several knowledge sources. Tested accuracy 
exceeds 94% on our evaluation corpus. Our system attempts ... 

Research track posters: Privacy-preserving Bayesian network structure computation 

on distributed heterogeneous data 
Rebecca Wright, Zhiqiang Yang 

August 2004 Proceedings of the 2004 ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^| pdf(217.22 KB) Additional Information: full citation , abstract , references , index terms 

As more and more activities are carried out using computers and computer networks, the 
amount of potentially sensitive data stored by business, governments, and other parties 
increases. Different parties may wish to benefit from cooperative use of their data, but 
privacy regulations and other privacy concerns may prevent the parties from sharing their 
data. Privacy-preserving data mining provides a solution by creating distributed data mining 
algorithms in which the underlying data is not reveal ... 
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Keyw rds: Bayesian network, distributed databases, privacy-preserving data mining 



4 KDD-99 conference reports: Profiling your customers using Bayesian networks 
Paola Sebastiani, Marco Ramoni, Alexander Crea 
January 2000 ACM SIGKDD Explorations Newsletter, volume l issue 2 

Full text available: ^pdf(1.22 MB) Additional Information: full citation , abstract 

This report describes a complete Knowledge Discovery session using Bayesware Discoverer, 
a program for the induction of Bayesian networks from incomplete data. We build two 
causal models to help an American Charitable Organization understand the characteristics 
of respondents to direct mail fund raising campaigns. The first model is a Bayesian network 
induced from the database of 96,376 Lapsed donors to the June *97 renewal mailing. The 
network, describes the dependency of the probability of resp ... 

Keywords: Bayesian networks, customer profiling, missing data 
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5 Web search 1 : Searching web databases by structuring keyword-based queries Q 
Pavel Calado, Altigran S. da Silva, Rodrigo C. Vieira, Alberto H. F. Laender, Berthier A. Ribeiro- 
Neto 

November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management . 

Full text available* 1s!l df(204 22 KB) Additional Information: full citation , abstract , references , citings , index 
u e avai a e.-g^j : terms 

On-line information services have become widespread in the Web nowadays. However, Web 
users are non-specialized and have a great variety of interests. Thus, interfaces for Web 
databases must be simple and uniform. In this paper we present an approach, based on 
Bayesian networks, for querying Web databases using keywords only. According to this 
approach, the user inputs a query through a simple search-box interface. From the input 
query, one or more plausible structured queries are derived and su ... 

Keywords: query structuring, structured queries, web databases 



6 Poster: Bayesian face recognition using Gabor features Q 
Xiaogang Wang, Xiaoou Tang 

November 2003 Proceedings of the 2003 ACM SIGMM workshop on Biometrics methods 
and applications 

Full text available: ^ pdf(512.87 KB) Additional Information: full citation , abstract , references , index terms 

In this paper, we propose a new face recognition approach combining a Bayesian 
probabilistic model and Gabor filter responses. Since both the Bayesian algorithm and the 
Gabor features can reduce intrapersonal variation through different mechanisms, we 
integrate the two methods to take full advantage of both approaches. The efficacy of the 
new method is demonstrated by the experiments on 1180 face images from the XM2VTS 
database and 1260 face images from the AR database. 

Keywords: Bayesian analysis, Gabor Wavelet, face recognition 



7 Poster papers: Mining complex models from arbitrarily large databases in constant Q 
time 

Geoff Hulten, Pedro Domingos 
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July 2002 Pr ceedings f the eighth ACM SIGKDD internati nal c nference n 

Kn wledge disc very and data mining 

p ii i , , . . a jr/oco co Additional information: full citation , abstract , references , citings , index 

Full text available: 139 pdf(853.58 KB) - — — — 

L£EJ ~^ terms 

In this paper we propose a scaling-up method that is applicable to essentially any induction 
algorithm based on discrete search. The result of applying the method to an algorithm is 
that its running time becomes independent of the size of the database, while the decisions 
made are essentially identical to those that would be made given infinite data. The method 
works within pre-specified memory limits and, as long as the data is iid, only requires 
accessing it sequentially. It gives anytime resu ... 

Keywords: Bayesian networks, Hoeffding bounds, discrete search, scalable learning 
algorithms, subsampling 



8 A Bayesian decision model for cost optimal record matching 
V. S. Verykios, G. V. Moustakides, M. G. Elfeky 

May 2003 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 12 Issue 1 

Full text available:^ pdfd 80.87 KB) Additional Information: full citation , abstract , index terms 

In an error-free system with perfectly clean data, the construction of a global view of the 
data consists of linking - in relational terms, joining - two or more tables on their key fields. 
Unfortunately, most of the time, these data are neither carefully controlled for quality nor 
necessarily defined commonly across different data sources. As a result, the creation of 
such a global data view resorts to approximate joins. In this paper, an optimal solution is 
proposed for the matching or the lin ... 

Keywords: Cost optimal statistical model, Data cleaning, Record linkage 
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9 Task clustering and gating for bayesian multitask learning 
Bart Bakker, Tom Heskes 

December 2003 The Journal of Machine Learning Research, volume 4 

_ Ia - . ., . , « . woon 00 u, D v Additional Information: full citation , abstract , references , citings , index 
Full text available: Tra paf(229. 33 KB) t ~ " 

terms 

Modeling a collection of similar regression or classification tasks can be improved by making 
the tasks 'learn from each other 1 . In machine learning, this subject is approached through 
'multitask learning', where parallel tasks are modeled as multiple outputs of the same 
network. In multilevel analysis this is generally implemented through the mixed-effects 
linear model where a distinction is made between 'fixed effects', which are the same for all 
tasks, and 'random effects', which may vary bet ... 



10 Research track papers: Interestingness of frequent itemsets using Bayesian networks Q 

as background knowledge 
Szymon Jaroszewicz, Dan A. Simovici 

August 2004 Proceedings of the 2004 ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available:^ pdff 19 1.90 KB) Additional Information: full citation , abstract , references , index terms 

The paper presents a method for pruning frequent itemsets based on background 
knowledge represented by a Bayesian network. The interestingness of an itemset is defined 
as the absolute difference between its support estimated from data and from the Bayesian 
network. Efficient algorithms are presented for finding interestingness of a collection of 
frequent itemsets, and for finding all attribute sets with a given minimum interestingness. 
Practical usefulness of the algorithms and their efficiency ... 
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Keyw rds: Bayesian network, association rule, background, frequent itemset, 
interestingness, knowledge 



11 Industry track papers: On the potential of domain literature for clustering and Bavesian Q 
network learning 

Peter Antal, Patrick Glenisson, Geert Fannes 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^ pdf(1.10 MB) Additional Information: full citation , abstract , references , index terms 

Thanks to its increasing availability, electronic literature can now be a major source of 
information when developing complex statistical models where data is scarce or contains 
much noise. This raises the question of how to integrate information from domain literature 
with statistical data. Because quantifying similarities or dependencies between variables is a 
basic building block in knowledge discovery, we consider here the following question. Which 
vector representations of text and which st ... 

Keywords: Bayesian networks, clustering, data mining, text mining 



12 Research track papers: A Bayesian network framework for reject inference 
Andrew Smith, Charles Elkan 

August 2004 Proceedings of the 2004 ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^ pdf(201.00 KB) Additional Information: full citation , abstract , references , index terms 

Most learning methods assume that the training set is drawn randomly from the population 
to which the learned model is to be applied. However in many applications this assumption 
is invalid. For example, lending institutions create models of who is likely to repay a loan 
from training sets consisting of people in their records to whom loans were given in the 
past; however, the institution approved loan applications previously based on who was 
thought unlikely to default. Learning from only appro ... 

Keywords: Bayesian networks, Heckman estimator, expectation-maximization, propensity 
scores, reject inference, sample selection bias 



13 Posters: Combining speech and haptics for intuitive and efficient navigation through 
image databases 

Thomas Kaster, Michael Pfeiffer, Christian Bauckhage 

November 2003 Proceedings of the 5th international conference on Multimodal 
interfaces 

Full text available: ^ pdf(239.65 KB) Additional Information: full citation , abstract , references , index terms 

Given the size of todays professional image databases, the stan-dard approach to object- or 
theme-related image retrieval is to in-teractively navigate through the content. But as most 
users of such databases are designers or artists who do not have a technical back-ground, 
navigation interfaces must be intuitive to use and easy to learn. This paper reports on 
efforts towards this goal. We present a system for intuitive image retrieval that features 
different moda-lities for interaction. Apart f ... 

Keyw rds: content-based image retrieval, fusion of haptics, multimodal interface 
evaluation, speech, vision processing 
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14 Context-specific Bayesian clustering for gene expression data Q 
Yoseph Barash, Nir Friedman 

April 2001 Pr ceedings f the fifth annual internati nal c nference n C mputati nal 
bi I gy 

c it 4 ,a ., kl 0. ^/ooo oo i/m Additional Information: full citation , abstract , references , citings , index 
Full text available: T5 a paf(233.32 KB) * 

terms 

The recent growth in genomic data and measurement of genome-wide expression patterns 
allows to examine gene regulation by transcription factors using computational tools. In this 
work, we present a class of mathematical models that help in understanding the 
connections between transcription factors and functional classes of genes based on genetic 
and genomic data. These models represent the joint distribution of transcription factor 
binding sites and of expression levels of a gene in a single ... 

15 Classification and browsing: Structuring keyword-based queries for web databases Q 
Rodrigo C. Vieira, Pavel Calado, Altigran S. da Silva, Alberto H. F. Laender, Berthier A. Ribeiro- 
Neto 

July 2002 Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries 

Full text available: ^p pdfd 16.95 KB) Additional Information: full citation , abstract , references , index terms 

This paper describes a framework, based on Bayesian belief networks, for querying Web 
databases using keywords only. According to this framework, the user inputs a query 
through a simple search-box. From the input query, one or more plausible structured 
queries are derived and submitted to Web databases. The results are then retrieved and 
presented to the user as ranked answers. To evaluate our framework, an experiment using 
38 example queries was carried out. We found out that 97% of the time, ... 

Keywords: bayesian belief networks, web databases, web query 



16 Scalable algorithms for mining large databases Q 
Rajeev Rastogi, Kyuseok Shim 

August 1999 Tutorial notes of the fifth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^ pdf(4.11 MB) Additional Information: full citation , references , index terms 



17 Special issue on the fusion of domain knowledge with data for decision support: Fusion Q 
of domain knowledge with data for structural learning in object oriented domains 

Helge Langseth, Thomas D. Nielsen 

December 2003 The Journal of Machine Learning Research, volume 4 

Full text available: ^j| pdf(227.18 KB) Additional Information: full citation , abstract , references , index terms 

When constructing a Bayesian network, it can be advantageous to employ structural 
learning algorithms to combine knowledge captured in databases with prior information 
provided by domain experts. Unfortunately, conventional learning algorithms do not easily 
incorporate prior information, if this information is too vague to be encoded as properties 
that are local to families of variables. For instance, conventional algorithms do not exploit 
prior information about repetitive structures, which are ... 

18 Selectivity estimation using probabilistic models Q 
Use Getoor, Benjamin Taskar, Daphne Koller 

May 2001 ACM SIGMOD Rec rd , Pr ceedings f the 2001 ACM SIGMOD internati nal 
c nference n Management f data, volume 30 issue 2 

t- .. * „* , u, Additional Information: full citation , abstract , references , citings , index 

Full text available: 
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■gj| pdf(525.74 KB) terms 

Estimating the result size of complex queries that involve selection on multiple attributes 
and the join of several relations is a difficult but fundamental task in database query 
processing. It arises in cost-based query optimization, query profiling, and approximate 
query answering. In this paper, we show how probabilistic graphical models can be 
effectively used for this task as an accurate and compact approximation of the joint 
frequency distribution of multiple attributes across multiple ... 

19 Queries and ag g regation: Cleaning and querying noisy sensors 
Eiman Elnahrawy, Badri Nath 

September 2003 Proceedings of the 2nd ACM international conference on Wireless 
sensor networks and applications 

Full text available: tfil Ddft256.08 KB) Additional Information: full citation , abstract, references , citings, index 

terms 

Sensor networks have become an important source of data with numerous applications in 
monitoring various real-life phenomena as well as industrial applications and traffic control. 
Unfortunately, sensor data is subject to several sources of errors such as noise from 
external sources, hardware noise, inaccuracies and imprecision, and various environmental 
effects. Such errors may seriously impact the answer to any query posed to the sensors. In 
particular, they may yield imprecise or even incorre ... 

Keywords: bayesian theory, noisy sensors, query evaluation, statistics, uncertainty, 
wireless sensor networks 



20 Scalable feature selection, classification and signature generation for organizing large Q 
text databases into hierarchical topic taxonomies 
Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, Prabhakar Raghavan 
August 1998 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 7 Issue 3 

Full text available: ^ pdf(281.37 KB) Additional Information: full citation , abstract , citings , index terms 

We explore how to organize large text databases hierarchically by topic to aid better 
searching, browsing and filtering. Many corpora, such as internet directories, digital 
libraries, and patent databases are manually organized into topic hierarchies, also called 
taxonomies. Similar to indices for relational data, taxonomies make search and access more 
efficient. However, the exponential growth in the volume of on-line textual information 
makes it nearly impossible to maintain such taxono ... 
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21 The use of lexicons in information retrieval in legal databases Q 
J. c. Smith 

June 1997 Proceedings of the sixth international conference on Artificial intelligence 
and law 

Full text available: ^ pdf(1.70 MB) Additional Information: full citation , references , index terms 



22 Industrial/government track: Empirical Bayesian data mining for discovering patterns in Q 
post-marketing drug safety 

David M. Fram, June S. Almenoff, William DuMouchel 

August 2003 Proceedings of the ninth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^ pdf(461.25 KB) Additional Information: full citation , abstract , references , index terms 

Because of practical limits in characterizing the safety profiles of therapeutic products prior 
to marketing, manufacturers and regulatory agencies perform post-marketing surveillance 
based on the collection of adverse reaction reports ("pharmacovigilance").The resulting 
databases, while rich in real-world information, are notoriously difficult to analyze using 
traditional techniques. Each report may involve multiple medicines, symptoms, and 
demographic factors, and there is no easily linked inf ... 

Keywords: association rules, data mining, empirical Bayes methods, pharmacovigilance, 
post- marketing surveillance 



23 Special issue on the fusion of domain knowledge with data for decision support: 
Combining knowledge from different sources in causal probabilistic models 
Marek J. Druzdzel, Francisco J. Dfez 

December 2003 The Journal of Machine Learning Research, volume 4 

Full text available: ^ pdf(1 40.32 KB) Additional Information: full citation , abstract , references , index terms 

Building probabilistic and decision-theoretic models requires a considerable knowledge 
engineering effort in which the most daunting task is obtaining the numerical parameters. 
Authors of Bayesian networks usually combine various sources of information, such as 
textbooks, statistical reports, databases, and expert judgement. In this paper, we 
demonstrate the risks of such a combination, even when this knowledge encompasses such 
seemingly population-independent characteristics as sensitivity and ... 



http://portal.acm.org/results.cfntf^ 12/23/04 



Results (page 2): bayesian and jury and database Page 2 of 6 



24 Knowledge discovery in databases: tools and techniques 
Peggy Wright 

November 1998 Cr ssr ads, volume 5 issue 2 

Full text available: g) html(28.84 KB) Additional Information: full citation , index terms 



25 Learning Bayesian classification rules through genetic algorithms Q 
Christoph F. Eick, Daw Jong 

December 1993 Proceedings of the second international conference on Information and 
knowledge management 

Full text available: Q pdf(848.77 KB) Additional Information: full citation , references , index terms 



26 Reports from KDD-2001: KDD Cup 2001 report Q 
Jie Cheng, Christos Hatzis, Hisashi Hayashi, Mark-A. Krogel, Shinichi Morishita, David Page, 
Jun Sese 

January 2002 ACM SIGKDD Explorations Newsletter volume 3 issue 2 

Full text available:^ pdf (1.96 MB) Additional Information: full citation , abstract , references , citings 

This paper presents results and lessons from KDD Cup 2001. KDD Cup 2001 focused on 
mining biological databases. It involved three cutting-edge tasks related to drug design and 
genomics. 

Keywords: Competition, biology, drug design, genomics 



27 Automatically structured and translated queries: The effectiveness of automatically 
structured queries in digital libraries 

Marcos Andre Gongalves, Edward A. Fox, Aaron Krowne, Pavel Calado, Alberto H. F. Laender, 
Altigran S. da Silva, Berthier Ribeiro-Neto 
June 2004 

Full text available:^ pdf(295.40 KB) Additional Information: full citation , abstract , references , index terms 

Structured or fielded metadata is the basis for many digital library services, including 
searching and browsing. Yet, little is known about the impact of using structure on the 
effectiveness of such services. In this paper, we investigate a key research question: do 
structured queries improve effectiveness in DL searching? To answer this question, we 
empirically compared the use of unstructured queries to the use of structured queries. We 
then tested the capability of a simple Bayesian network s ... 

Keywords: bayesian networks, digital libraries, structured queries 



28 Tutorial database mining 
Rakesh Agrawal 

May 1994 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on 
Principles f database systems 

Full text available: ^ pdf(123.97 KB) Additional Information: full citation , abstract , references , index terms 

We view database mining as the efficient construction and verification of models of patterns 
embedded in large databases. Many of the database mining problems have been motivated 
by the practical decision support problems faced by most large retail organizations. In the 
Quest project at the IBM Almaden Research center, we have focussed on three classes of 
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database mining problems involving classification, associations, and sequences. In this 
tutorial, I will draw upon my Quest experience to ... 

29 The true lift model: a novel data mining approach to response modeling in database 

marketing 
Victor S. Y. Lo 

December 2002 ACM SIGKDD Explorations Newsletter, volume 4 issue 2 

Full text available: ^j| pdf(1 19.81 KB) Additional Information: full citation , abstract , references 

In database marketing, data mining has been used extensively to find the optimal customer 
targets so as to maximize return on investment. In particular, using marketing campaign 
data, models are typically developed to identify characteristics of customers who are most 
likely to respond. While these models are helpful in identifying the likely responders, they 
may be targeting customers who have decided to take the desirable action or not 
regardless of whether they receive the campaign contact (e ... 

Keywords: customer development, customer relationship management, data mining, 
database marketing, interaction effect, knowledge discovery, predictive modeling, response 
modeling, treatment effect, true lift, upselling and cross-selling 



30 The psychology of multimedia databases Q 
Mark G. L M. van Doom, Arjen P. de Vries 

June 2000 Proceedings of the fifth ACM conference on Digital libraries 

Additional Information: full citation , abstract , references , citings , index 
terms 

Multimedia information retrieval in digital libraries is a difficult task for computers in 
general. Humans on the other hand are experts in perception, concept representation, 
knowledge organization and memory retrieval. Cognitive psychology and science describe 
how cognition works in humans, but can offer valuable clues to information retrieval 
researchers as well. Cognitive psychologists view the human mind as a general-purpose 
Ysymbol-processing system that interacts with the ... 

Keywords: Marr's theory of perception, Paivio's dual coding theory, cognitive psychology 
and information retrieval, user and domain knowledge in query formulation 



Full text available:^ pdf(1. 43 MB; 



31 Accepted Posters: Information filtering using bayesian networks: effective user 
interfaces for aviation weather data 
Corinne Clinton Ruokangas, Ole J. Mengshoel 

January 2003 Proceedings of the 8th international conference on Intelligent user 
interfaces 

Full text available: ^j| pdf(1.09 MB) Additional Information: full citation , abstract , references , index terms 

Weather is a complex, dynamic process with tremendous impact on aviation. While pilots 
often have access to large amounts of aviation weather data, they find it difficult and time- 
consuming to identify weather hazards, due to the sheer amount and cryptic formatting of 
the data. To address this challenge, we have developed information filtering concepts based 
on a unified Bayesian network model, integrating text and graphical weather data in the 
context of specific mission, equipment and personal ... 

Keyw rds: bayesian models, bayesian networks, data filtering, information management, 
intelligent visualization, situation awareness 
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32 A Bayesian approach toward active learning for collaborative filtering 
Rong Jin, Luo Si 

July 2004 Pr ceedings f the 20th c nference n Uncertainty in artificial intelligence 

Full text available: ^ pdf(378.07 KB) Additional Information: full citation , abstract , references 

Collaborative filtering is a useful technique for exploiting the preference patterns of a group 
of users to predict the utility of items for the active user. In general, the performance of 
collaborative filtering depends on the number of rated examples given by the active user. 
The more the number of rated examples given by the active user, the more accurate the 
predicted ratings will be. Active learning provides an effective way to acquire the most 
informative rated examples from active user ... 

33 Video retrieval: Semi-supervised learning for facial expression recognition 
Ira Cohen, Nicu Sebe, Fabio G. Cozman, Thomas S. Huang 

November 2003 Proceedings of the 5th ACM SIGMM international workshop on 
Multimedia information retrieval 

Full text available: ^ pdf(341.70 KB) Additional Information: full citation , abstract , references , index terms 

Automatic classification by machines is one of the basic tasks required in any pattern 
recognition and human computer interaction applications. In this paper, we discuss training 
probabilistic classifiers with labeled and unlabeled data. We provide an analysis which 
shows under what conditions unlabeled data can be used in learning to improve 
classification performance. We discuss the implications of this analysis to a specific type of 
probabilistic classifiers, Bayesian networks, and propose a ... 

Keywords: Bayesian networks, facial expression recognition, semi-supervised learning 



34 Optimal sample cost residues for differential database batch query problems 
Dan E. Willard 

January 1991 Journal of the ACM (JACM), volume 38 issue l 

Full text available: ^ pdf(1.09 MB) Additional Information: full citation , abstract, references , citings, index 
13 terms , review 

In many computing applications, there are several equivalent algorithms capable of 
performing a particular task, and no one is the most efficient under all statistical 
distributions of the data. In such contexts, a good heuristic is to take a sample of the 
database and use it to guess which procedure is likely to be the most efficient. This paper 
defines the very general notion of a differentiate query problem and shows that the ideal 
sample size for guessing the optimal choice of algorith ... 

Keywords: databases, sampling 



35 Analysis methodology: Simulation of large networks: propagation of uncertainty in a 
simulation-based maritime risk assessment model utilizing Bayesian simulation 
techniques 

Jason R. W. Merrick, Varun Dinesh, Amita Singh, J. Rene van Dorp, Thomas A. Mazzuchi 
December 2003 Proceedings of the 35th conference on Winter simulation: driving 
innovation 

Full text available:^ pdf(6Q6. 38 KB) Additional Information: full citation , abstract , references 

Recent studies in the assessment of risk in maritime transportation systems have used 
simulation-based probabilistic techniques. Amongst them are the San Francisco Bay (SFB) 
Ferry exposure assessment in 2002, the Washington State Ferry (WFS) Risk Assessment in 
1998 and the Prince William Sound (PWS) Risk Assessment in 1996. Representing 
uncertainty in such simulation models is fundamental to quantifying system risk. This paper 
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illustrates the representation of uncertainty in simulation using ... 

36 Applying general Bayesian techniques to improve TAN induction 
Jesus Cerquides 

August 1999 Pr ceedings f the fifth ACM SIGKDD internati nal c nference n 
Kn wledge disc very and data mining 

Full text available: ^| pdf(515.13 KB) Additional Information: full citation , references , citings , index terms 



37 Content-based filtering & collaborative filtering: A nonparametric hierarchical bayesian Q 
framework for information filtering 
Kai Yu, Volker Tresp, Shipeng Yu 

July 2004 Proceedings of the 27th annual international conference on Research and 
development in information retrieval 

Full text available: pdf(425.46 KB) Additional Information: full citation , abstract , references , index terms 

Information filtering has made considerable progress in recent years. The predominant 
approaches are content-based methods and collaborative methods. Researchers have 
largely concentrated on either of the two approaches since a principled unifying framework 
is still lacking. This paper suggests that both approaches can be combined under a 
hierarchical Bayesian framework. Individual content-based user profiles are generated and 
collaboration between various user models is achieved via a co ... 

Keywords: collaborative filtering, content-based filtering, dirichlet process, nonparametric 
bayesian modelling 



38 Probabilistic object bases 

Thomas Eiter, James J. Lu, Thomas Lukasiewicz, V. S. Subrahmanian 
September 2001 ACM Transactions on Database Systems (TODS), volume 26 issue 3 

Full text available: ^ pdf(663.73 KB) Additional Information: full citation , abstract , references , index terms 

Although there are many applications where an object-oriented data model is a good way of 
representing and querying data, current object database systems are unable to handle 
objects whose attributes are uncertain. In this article, we extend previous work by 
Kornatzky and Shimony to develop an algebra to handle object bases with uncertainty. We 
propose concepts of consistency for such object bases, together with an NP-completeness 
result, and classes of probabilistic object bases for which consi ... 

Keywords: Consistency, object-oriented database, probabilistic object algebra, 
probabilistic object base, probability, query language, query optimization 



u 



39 A retrieval model incorporating hypertext links Q 
W. B. Croft, H. Turtle 

November 1989 Proceedings of the second annual ACM conference on Hypertext 

Full text available: ^ pdf(769.84 KB) Additional Information: full citation , references , citings , index terms 



40 Propositional and relational Bayesian networks associated with imprecise and 
qualitative probabilistic assessments 

Fabio Gagliardi Cozman, Cassio Polpo de Campos, Jaime Shinsuke Ide, Jose Carlos Ferreira da 
Rocha 

July 2004 Pr ceedings f the 20th c nference n Uncertainty in artificial intelligence 
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Full text available: ^ pdf(340.75 KB) Additional Information: full citation , abstract , references 

This paper investigates a representation language with flexibility inspired by probabilistic 
logic and compactness inspired by relational Bayesian networks. The goal is to handle 
propositional and first-order constructs together with precise, imprecise, indeterminate and 
qualitative probabilistic assessments. The paper shows how this can be achieved through 
the theory of credal networks. New exact and approximate inference algorithms based on 
multilinear programming and iterated/loopy propaga ... 
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41 Pac-bayesian generalisation error bounds for gaussian process classification 
Matthias Seeger 

March 2003 The Journal of Machine Learning Research, volume 3 

Full text available: ^ pdf(487.11 KB) Additional Information: full citation , abstract , references , index terms 

Approximate Bayesian Gaussian process (GP) classification techniques are powerful non- 
parametric learning methods, similar in appearance and performance to support vector 
machines. Based on simple probabilistic models, they render interpretable results and can 
be embedded in Bayesian frameworks for model selection, feature selection, etc. In this 
paper, by applying the PAC-Bayesian theorem of McAllester (1999a), we prove distribution- 
free generalisation error bounds for a wide range of approxima ... 



Keywords: Bayesian learning, Gaussian processes, Gibbs classifier, Kernel machines, PAC- 
Bayesian framework, convex duality, generalisation error bounds, sparse approximations 



42 Learning equivalence classes of bavesian-network structures 
David Maxwell Chickering 

March 2002 The Journal of Machine Learning Research, volume 2 

i- ii* ^ -i ui 0i M/AAn oo iso\ Additional Information: full citation , abstract , references , citings , index 

Full text available: TO pdf(442.83 KB) — * — ™ 

U** £ — 1 ' terms 

Two Bayesian-network structures are said to be <em> equivalent</em> if the set of 
distributions that can be represented with one of those structures is identical to the set of 
distributions that can be represented with the other. Many scoring criteria that are used to 
learn Bayesian-network structures from data are <em> score equivalent</em>; that is, 
these criteria do not distinguish among networks that are equivalent. In this paper, we 
consider using a score equivalent ... 

43 A hierarchical access control model for video database systems 

Elisa Bertino, Jianping Fan, Elena Ferrari, Mohand-Said Hacid, Ahmed K. Elmagarmid, 
Xingquan Zhu 

April 2003 ACM Transacti ns n Inf rmati n Systems (TOIS), volume 21 issue 2 

Full text available: ^ pdf(6.27 MB) Additional Information: full citation , abstract , references , index terms 

Content-based video database access control is becoming very important, but it depends on 
the progresses of the following related research issues: (a) efficient video analysis for 
supporting semantic visual concept representation; (b) effective video database indexing 
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structure; (c) the development of suitable video database models; and (d) the development 
of access control models tailored to the characteristics of video data. In this paper, we 
propose a novel approach to support multilevel acce ... 

Keyw rds: Video database models, access control, indexing schemes 



44 QProber: A system for automatic classification of hidden-Web databases 
Luis Gravanp, Panagiotis G. Ipeirotis, Mehran Sahami 

January 2003 ACM Transactions on Information Systems (TOIS), volume 21 issue 1 
Full text available: ^ pdf(3.62 MB) Additional Information: full citation , abstract , references , index terms 

The contents of many valuable Web-accessible databases are only available through search 
interfaces and are hence invisible to traditional Web "crawlers." Recently, commercial Web 
sites have started to manually organize Web-accessible databases into Yahoo!-like 
hierarchical classification schemes. Here we introduce QProber, a modular system that 
automates this classification process by using a small number of query probes, generated 
by document classifiers. QProber can use a variety of types of ... 

Keywords: Database classification, Web databases, hidden Web 



45 Probabilistic temporal databases. I: algebra 
Alex Dekhtyar, Robert Ross, V. S. Subrahmanian 

March 2001 ACM Transactions on Database Systems (TODS), volume 26 issue 1 

_ ^ . . . a -/0-70 no i^ D \ Additional Information: full citation , abstract , references , citings , index 

Full text available: TO pdf(878.03 KB) *~ 

^ " " terms 

Dyreson and Snodgrass have drawn attention to the fact that, in many temporal database 
applications, there is often uncertainty about the start time of events, the end time of 
events, and the duration of events. When the granularity of time is small (e.g., 
milliseconds), a statement such as "Packet p was shipped sometime during the first 5 days 
of January, 1998" leads to a massive amount of uncertainty (5x24x60x60x1000) 
possibilities. A ... 

46 Query evaluation techniques for large databases 
Goetz Graefe 

June 1993 ACM Computing Surveys (CSUR), volume 25 issue 2 

, - - .. . . a .„ ft 0 -> Additional Information: full citation , abstract , references , citings , index 

Full text available: TO pdf(9.37 MB) 

^ terms , review 

Database management systems will continue to manage large data volumes. Thus, efficient 
algorithms for accessing and manipulating large sets and sequences will be required to 
provide acceptable performance. The advent of object-oriented and extensible database 
systems will not solve this problem. On the contrary, modern data models exacerbate the 
problem: In order to manipulate large sets of complex objects as efficiently as today's 
database systems manipulate simple records, query- processi ... 

Keywords: complex query evaluation plans, dynamic query evaluation plans, extensible 
database systems, iterators, object-oriented database systems, operator model of 
parallelization, parallel algorithms, relational database systems, set-matching algorithms, 
sort-hash duality 
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August 2004 The J urnal f Machine Learning Research, volume 5 

Full text available: ^pdf(154.18 KB) Additional Information: full citation , abstract , index terms 

Bagging frequently improves the predictive performance of a model. An online version has 
recently been introduced, which attempts to gain the benefits of an online algorithm while 
approximating regular bagging. However, regular online bagging is an approximation to its 
batch counterpart and so is not lossless with respect to the bagging operation. By operating 
under the Bayesian paradigm, we introduce an online Bayesian version of bagging which is 
exactly equivalent to the batch Bayesian version ... 

48 Learning Bayesian network classifiers by maximizing conditional likelihood 
Daniel Grossman, Pedro Domingos 

July 2004 Twenty-first international conference on Machine learning 

Full text available: ^j| pdf (187.23 KB) Additional Information: full citation , abstract , references 

Bayesian networks are a powerful probabilistic representation, and their use for 
classification has received considerable attention. However, they tend to perform poorly 
when learned in the standard way. This is attributable to a mismatch between the objective 
function used (likelihood or a function thereof) and the goal of classification (maximizing 
accuracy or conditional likelihood). Unfortunately, the computational cost of optimizing 
structure and parameters for conditional likelihood is pro ... 

49 Using Bayesian networks to analyze expression data 
Nir Friedman, Michal Linial, Iftach Nachman, Dana Pe'er 
April 2000 Proceedings of the fourth annual international conference on 

Computational molecular biology 

Full text available: ^ pdf(952.91 KB) Additional Information: full citation , abstract , references , citings 

DNA hybridization arrays simultaneously measure the expression level for thousands of 
genes. These measurements provide a "snapshot" of transcription levels within the cell. A 
major challenge in computational biology is to uncover, from such measurements, 
gene/protein interactions and key biological features of cellular systems. 

In this paper, we propose a new framework for discovering interactions between genes 
based on multiple expression measurements This framework buil ... 

so Industrial/government track: Clinical and financial outcomes analysis with existing 
hospital patient records 

R. Bharat Rao, Sathyakama Sandilya, Radu Stefan Niculescu, Colin Germond, Harsha Rao 
August 2003 Proceedings of the ninth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^j| pdf(1 88.40 KB) Additional Information: full citation , abstract , references , index terms 

Existing patient records are a valuable resource for automated outcomes analysis and 
knowledge discovery. However, key clinical data in these records is typically recorded in 
unstructured form as free text and images, and most structured clinical information is 
poorly organized. Time-consuming interpretation and analysis is required to convert these 
records into structured clinical data. Thus, only a tiny fraction of this resource is utilized. 
We present REMIND, a Bayesian Framework for Reliable ... 

Keyw rds: Bayes Nets, HMMs, data mining, temporal reasoning 
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Sharma Chakravarthy, Alp Aslandogan, Ramez Elmasri, Leonidas Fegaras, JungHwan Oh 
March 2003 ACM SIGMOD Rec rd, volume 32 issue l 
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52 A survey on wavelet applications in data mining 
Tao Li, Qi Li, Shenghuo Zhu, Mitsunori Ogihara 

December 2002 ACM SIGKDD Explorations Newsletter, volume 4 issue 2 

Full text available: ^ pdf(330.06 KB) Additional Information: full citation , abstract , references , citings 

Recently there has been significant development in the use of wavelet methods in various 
data mining processes. However, there has been written no comprehensive survey available 
on the topic. The goal of this is paper to fill the void. First, the paper presents a high-level 
data-mining framework that reduces the overall process into smaller components. Then 
applications of wavelets for each component are reviewd. The paper concludes by 
discussing the impact of wavelets on data mining research an ... 

53 Statistical methods I: Bayesian analysis of massive datasets via particle filters 
Greg Ridgeway, David Madigan 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^ pdf(896.64 KB) Additional Information: full citation , abstract , references , index terms 

Markov Chain Monte Carlo (MCMC) techniques revolutionized statistical practice in the 
1990s by providing an essential toolkit for making the rigor and flexibility of Bayesian 
analysis computationally practical. At the same time the increasing prevalence of massive 
datasets and the expansion of the field of data mining has created the need to produce 
statistically sound methods that scale to these large problems. Except for the most trivial 
examples, current MCMC methods require a complete scan o ... 

54 On the automation of physical database design 
Sunil Choenni, Henk M. Blanken, Thiel Chang 

March 1993 Proceedings of the 1993 ACM/SIGAPP symposium on Applied computing: 
states of the art and practice 

Full text available: ^ pdf(865.40 KB) Additional Information: full citation , references , index terms 



Keywords: Dempster-Shafer theory, generating storage schemes, modeling rules of 
thumb, physical database design 



55 Mining lesion-deficit associations in a brain image database 
Vasileios Megalooikonomou, Christos Davatzikos, Edward H. Herskovits 
August 1999 Proceedings of the fifth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: pdf(893.76 KB) Additional Information: full citation , references , index terms 



56 Unsupervised Bayesian visualization of high-dimensional data 
Petri Kontkanen, Jussi Lahtinen, Petri Myllymaki, Henry Tirri 

August 2000 Pr ceedings f the sixth ACM SIGKDD internati nal c nference n 
Kn wledge disc very and data mining 

Full text available: ^pdf(1 60.91 KB) Additional Information: full citation , references , index terms 
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57 Image Retrieval: Extraction of feature subspaces for content-based retrieval using 
relevance feedback 
Zhong Su, Stan Li, Hongjiang Zhang 

October 2001 Pr ceedings f the ninth ACM internati nal c nference n Multimedia 



In the past few years, relevance feedback (RF) has been used as an effective solution for 
content-based image retrieval (CBIR). Although effective, the RF-CBIR framework does not 
address the issue of feature extraction for dimension reduction and noise reduction. In this 
paper, we propose a novel method for extracting features for the class of images 
represented by the positive images provided by subjective RF. Principal Component Analysis 
(PCA) is used to reduce both noise contained in the orig ... 

Keywords: Bayesian estimation, content-based image retrieval (CBIR), principal 
component analysis (PCA), relevance feedback 



58 New direction for uncertainty reasoning in deductive databases 
U. Guntzer, W. KieSling, H. Thone 

April 1991 ACM SIGMOD Record , Proceedings of the 1991 ACM SIGMOD international 

conference on Management of data, volume 20 issue 2 
Full text available: ^ pdf(923,93 KB) Additional Information: full citation , references , citings , index terms 



59 Efficient reasoning Q 
Russell Greiner, Christian Darken, N. Iwan Santoso 
March 2001 ACM Computing Surveys (CSUR), volume 33 issue 1 



Many tasks require "reasoning"— i.e., deriving conclusions from a corpus of explicitly stored 
information— to solve their range of problems. An ideal reasoning system would produce all- 
and-only the correct answers to every possible query, produce answers that are as specific 
as possible, be expressive enough to permit any possible fact to be stored and any possible 
query to be asked, and be (time) efficient 

Keywords: efficiency trade-offs, soundness/completeness/expressibility 



60 Session 13: audio processing and retrieval: Speaker change detection and tracking in Q 
real-time news broadcasting analysis 
Lie Lu, Hong-Jiang Zhang 

December 2002 Proceedings of the tenth ACM international conference on Multimedia 

Full text available: ^| pdf(273.64 KB) Additional Information: full citation , abstract , references , citings 

This paper addresses the problem of real time speaker change detection and speaker 
tracking in broadcasted news video analysis. In such a case, both speaker identities and 
number of speakers are assumed unknown. A two-step speaker change detection 
algorithm, including potential change detection and refinement, is proposed. Speaker 
tracking is performed based on the results of speaker change detection. A Bayesian Fusion 
method is used to fuse multiple audio features to get a more reliable result. ... 

Keyw rds: audio content analysis, speaker change detection, speaker segmentation, 
speaker tracking 
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